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\  Abstract 

'"''"In  this  paper  we  presenti.some  of  the  basic  ideas  associated  with  the 

"  t'/  ,  ,  i . 

detection  of  abrupt  changes  in  dynamic  systems,  jarxf' presentation  focuses  on 
two  classes  of  methods  —  multiple  filter-based  techniques  and  residual-based 
methods  —  and  in  far  more  detail  on  the  multiple  model  and  generalized 
likelihood  ratio  methods.  Issues  such  as  the  effect  of  unknown  onset  time  on 
algorithm  complexity  and  structure  and  robustness  to  model  uncertainty  are 
discussed.  _  _  _ 
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1.  Introduction 

In  recent  years  many  techniques  have  been  proposed  for  the  detection  of 
abrupt  chnages  in  dynamic  systems.  These  efforts  have  been  motivated  by  a 
wide  variety  of  applications  includinq  the  detection  of  sensor  and  actuator 
failures  [1,  2,  4,  19,  26-35]  the  tracking  of  maneuvering  vehicles  [20,  21, 

23,  25],  and  numerous  signal  analysis  problems  (electrocardiogram  analysis 
[5,  6],  geophysical  signal  processing  [7],  edqe  detection  in  images  [8,  9), 
freeway  monitoring  [10,  11],...).  A  key  to  the  development  of  any  technique 
for  the  detection  of  abrupt  chanqes  is  the  modeling  of  how  the  abrupt  change 
affects  the  observed  signals.  In  some  applications  the  effect  of  the  abrupt 
change  is  direct  and  simple  —  e.g.  a  bias  developing  in  an  output  signal. 

In  such  problems  the  primary  focus  of  research  is  on  the  precise  nature  of 
the  decision  rule  (see,  for  example  [8,  9,  26]).  In  other  applications  (such 
as  those  described  in  [1,  2,  4,  10,  11,  19,  21]),  the  effect  on  the  observ¬ 
ables  is  described  in  a  more  complex,  indirect  way  —  for  example,  in  terms 
of  an  abrupt  change  in  the  dynamics  of  a  system.  In  such  problems  one  is 
presented  in  essence  with  two  problems:  the  processing  of  the  observed  sig¬ 
nals  in  order  to  accentuate  (and  simplify)  the  effect  of  the  abrupt  change 
and  the  definition  of  decision  statistics  and  rules  in  terms  of  the  processed 
outputs.  The  techniques  described  in  this  paper  in  principle  address  both 
of  these  issues  in  that  they  produce  sufficient  statistics  for 
optimum  detection.  However,  we  will  focus  for  the  most  part  on  the  first 
task  of  change  detection,  that  is,  the  problem  of  producing  signals  which  make 
subsequent  detection  as  easy  as  possible.  As  discussed  here  and  in  more 
detail  in  [27-29],  this  is  an  exceedingly  important  perspective  in  the  design 
of  detection  methods  which  are  robust  to  uncertain  details  of  the  dynamic 
models  on  which  they  are  based. 


In  [1]  a  variety  of  methods  and  structures  are  described  for  change 
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detection.  In  this  paper  we  focus  on  two  basic  and  extremely  important 
structures.  The  first  of  these  is  the  multiple  filter  structure  depicted 
in  Figure  1.  Here  the  observations,  y,  art-  processed  by  a  bank  of  filters 
each  of  which  is  based  on  a  particular  hypothesis  (o.g.  Filter  ffl  assumes  no 
change  has  occurred.  Filter  #2  assumes  a  particular  type  of  change  has 
occurred  possibly  at  a  particular  time,  etc.)  The  outputs  of  the  filters#  y, 
represent  signals  which  should  typically  be  small  if  the  corresponding  hypo¬ 
theses  are  in  fact  correct,  and  thus  the  decision  mechanism  in  essence  is 
based  on  determining  which  of  the  filters  is  doing  the  "best"  job  of  keeping 
the  corresponding  y's  small.  There  are  several  methods  that  have  been  de¬ 
veloped  which  fit  the  general  form  of  Figure  1.  In  particular,  hard  [331 
and  soft  [34]  voting  systems  can  be  interpreted  in  this  fashion.  Another 
example  is  the  multiple  observer  desiqn  described  in  [36].  In  the  next 
section  we  describe  in  detail  a  third  technique  of  this  general  type,  namely 
the  multiple  model  method. 

A  second  general  structure  for  the  detection  of  abrupt  changes  is  the 
residual-based  structure  illustrated  in  Figure  2.  In  this  case  a  filter  is 
designed  based  on  the  assumption  that  no  abrupt  change  has  occurred  or  will 
occur.  The  filter  produces  a  prediction  y  of  the  output  signal  y  based  on 
this  assumption  and  the  past  history  of  the  output,  and  this  prediction  is 
subtracted  from  the  actual  output  to  produce  a  residual  signal y  .  If  no 
abrupt  change  has  occurred,  Y  should  be  small.  Consequently  deviations 
from  this  behavior  are  indicative  of  failure,  and  it  is  on  this  fact  that 
the  decision  mechanism  is  based.  Again  there  are  a  variety  of  techniques 
of  this  general  form.  In  [35]  a  variety  of  statistical  tests  (chi-squared, 
whiteness,  etc.)  are  proposed  for  the  detection  of  abrupt  changes  when  the 
Y  are  the  innovations  from  a  Kalman  filter.  In  (30-32)  a  method  is  described 
for  the  choice  of  gain  in  an  observer-like  filter  in  order  to  guarantee  that 
the  decoupling  of  the  steady-state  effects  on  y  of  a  given  set  of  possible 
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abrupt  changes.  In  Section  3  we  discuss  a  third  technique  of  this  general 
type,  namely  the  generalized  likelihood  ratio  method. 

2.  The  Multiple  Model  (MM)  Method 


t 
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The  MM  method  was  originally  developed  for  problems  of  system  identifi¬ 
cation  and  adaptive  control  [12-17,  241,  and  in  the  initial  part  of  this 
sect,  on  we  follow  these  early  treatments.  Subsequently  we  will  look  more 
closdy  at  the  issues  that  arise  and  possible  adaptations  that  may  be 
necessary  for  the  use  of  MM  for  the  detection  of  abrupt  events  (see  (1,  2,  5, 
10,  18,  19,  22,  23]  for  further  developments) . 

The  basic  (#!  method  deals  with  the  following  problem.  We  observe  the 
inputs  u(k) ,  k  =  0,1,2,...  and  outputs  y(k) ,  k  =  1,2,...  of  a  system  which 
is  assumed  to  obey  one  of  a  given  finite  set  of  linear  stochastic  models, 
indexed  by  i  *  1,...,N: 

xi(k+l)  =  Ai(k)xi(k)  +  B.(k)u(k)  ♦  w±(k)  +  g^k)  (2.1) 

y  (k)  =  ^(kjx.tk)  +  vA(k)  +  b.(k>  (2.2) 

where  w^(k)  and  v^ (k)  are  independent,  zero-mean  Gaussian  white  noise  pro¬ 
cesses.  with 

E[wi(k)wi(j)'l  =  Qi(k)«  (2.3) 

E[vi(k)vi(j)'l  -  (2‘4) 

The  initial  state  x^O)  is  assumed  to  be  Gaussian,  independent  of  Wj 
and  v^,  with  mean  (0 ] 0)  and  covariance  P^(o|o)  (the  meaning  of  this  nota¬ 
tion  will  become  clear  in  a  moment).  The  matrices  A^(k),  B^(k),  C^(k), 
Qi(k),  and  R^(k)  are  assumed  to  be  known.  Also,  b^k)  and  g^(k)  are  given 
deterministic  functions  of  time  (corresponding  to  biases,  linearizations 
about  different  operating  points,  etc.).  In  addition,  the  state  vectors 
x^(k)  may  be  of  different  dimensions  for  different  values  of  i  (correspond¬ 
ing  to  assuming  that  the  different  hypothesized  models  represent  different 
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orders  for  the  dynamics  of  the  real  system) .  There  are  a  number  of  issues 
that  can  be  raised  concerning  this  formulation,  and  we  defer  our  critique  of 
the  MM  method  until  after  we  have  developed  its  basic  structure.  We  note 
here  only  one  technical  point  which  is  that  we  will  focus  on  a  discrete- 
time  formulation  of  the  MM  method.  Continuous- time  versions  can  be  found 
in  the  literature  (see  [24)),  and  they  differ  from  their  discrete-time 
counterparts  only  in  a  technical  and  not  in  a  conceptual  or  structural 
manner. 

Assuming  that  one  of  these  N  models  is  correct,  we  now  have  a  standard 
multiple  hypothesis  testing  problem.  That  is,  let  denote  the  hypothesis 
that  the  real  system  corresponds  to  the  ith  model,  and  let  p^(0)  denote  the 
a  priori  probability  that  is  true.  Similarly,  let  (k)  denote  the 
probability  that  IT  is  true  based  on  measurements  through  the  kth  measure¬ 
ment,  i.e.  given  *  {u(0) , . . . ,u(k-l) ,  y (1) , . . . ,y (k) } .  Then  Bayes'  rule 
yields  the  following  recursive  formula  for  the  Ik) 

p(y(k+l)|H  ,1  ,u(k))p  (k) 

Pi(k+1)  =  - - “ - 1 -  (2.5) 

l  p(y (k+1) |h. ,1  ,u(k))p  (k) 
j=l  3  k  3 

Thus,  the  quantities  that  must  be  produced  at  each  time  are  the  conditional 
probability  densities  p(y(k+l) |ir  .I^.ulk) )  for  i=l,...,N.  However,  con¬ 
ditioned  on  ,  this  probability  density  is  precisely  the  one  step  prediction 
density  produced  by  a  Kalman  filter  based  on  the  ith  model. 

That  is,  let  SLtk+ljk)  be  the  one-step  predicted  estimate  of  x^(k+H 
based  on  1^  and  u(k),  assuming  that  is  true.  Also  let  x^(k+l|k*l)  denote 
the  filtered  estimate  of  x^k+1)  based  on  Ifc+1*{lk,u(k) ,y (k+1) }  and  the  ith 
model.  Then  these  quantities  are  computed  sequentially  from  the  following 
equations! 


a^k+lll)  -  Ai(k)xi<k|k)  ♦  Bi(k)u(k)  ♦  g^k) 


(2.6) 


(2.7) 


xMk+ljk+l)  *  jL  (k+1 |k)  +  Ki(k+l)yi(k+l) 
where  (k+1)  is  the  measurement  innovations  process 

Y^k+1)  ■  y (k+1)  -  C^ktx  (k+l|k)  (2.8) 

and  K(k+1)  is  calculated  off-line  from  the  following  set  of  equations: 

P.(k+l|k)  «  Ai(k)Pi(k|k)A^(k)  +Q.(k>  (2.9) 

Vi(k+1)  -  Ci(k)Pi(k+l|k)C^(k)  +  R.(k)  (2.10) 

Ka (k+1)  =  Pi(k+l|k)C^(k)v"1(k+l)  (2.11) 

P.^k+llk+1)  «  P.(k+l|k)  -  Ki(k+l)Ci(k)P.(k+l|k)  (2.12) 


Here  (k+l | k)  denotes  the  estimation  error  covarinace  in  the  estimate 
fc^(k+l|k)  (assuming  to  be  true),  and  P^(k+l|k+l)  is  the  covariance  of 
the  error  xMk+1)  -  x^ (k+1 | k+l),  again  based  on  H..  Also  under  hypothesis 
Hi»Yi(k+l)  is  zero  mean  with  covariance  V^k+l),  and  it  is  normally  dis¬ 
tributed  (since  we  have  assumed  that  all  noises  are  Gaussian) .  Furthermore, 
conditioned  on  H^,  Ifc,  and  u(k) ,  y(k+l)  is  Gaussian,  has  mean  (k^fk+llk) 
and  covariance  Vi (k+1) .  Thus,  from  (2.8)  we  deduce  that 


p(y(k+l) |Hi,Ik>u(k) ) 


_ 1 _ 

(2iT)B/2(detVi  (k+l))1/2 


exp  {-  |  y* (k+1) V'1 (k+1) 

.  Yi(k+1)>  (2.13) 


where  m  is  the  dimension  of  y. 

Equations  (2.5)  -  (2.8)  and  (2.13)  define  the  MM  algorithm.  The  inputs 
to  the  procedure  are  the  y(k)  and  u(k),  and  the  outputs  are  the  p^OO.  the 
implementation  of  the  algorithm  can  be  viewed  as  consisting  of  a  bank  of  N 
Kalman  filters,  one  based  on  each  of  the  N  possible  models,  the  outputs  of 
these  Kalman  filters  are  the  innovations  sequences  (k+1) ,  which  effecti¬ 
vely  measure  how  well  each  of  the  filters  can  track  and  predict  the  behavior 
of  the  observed  data.  Specifically,  if  the  ith  model  is  correct,  than  the 
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one-step  prediction  error  Y^(k)  should  be  a  white  sequence,  resulting  only 
from  the  intrinsic  uncertainty  ir  the  ith  model.  However  if  the  ith  model 
is  not  correct,  then  Y^(k)  will  not  bo  white  and  will  include  errors  due  to 
the  fact  that  the  prediction  is  based  on  an  erroneous  model.  Thus  the  pro¬ 
bability  calculation  (2.5),  (2.13)  basically  provides  a  quantitative  way  in 
which  to  assess  which  model  is  most  likely  to  be  correct  by  comparing  the 
performances  of  predictors  based  on  these  models. 

Let  us  now  address  several  of  the  most  important  questions  that  arise 
in  understanding  how  the  MM  algorithm  should  be  used.  Clearly  a  very 
important  question  concerns  the  use  of  MM  in  problems  in  which  the  real 
system  is  nonlinear  and/or  the  noises  are  non-Gaussian.  The  answer  to  this 
problem  is  extremely  application-dependent.  The  Gaussian  assumption  is 
basically  used  in  one  place — i.e.  in  the  evaluation  of  p(y (k+1)  jH^I^fUfk) ) 
in  (2.13).  It  has  been  our  experience  that  using  this  formula,  even  when 
Y^(k+1)  is  non-Gaussian,  causes  essentially  no  performance  degradation.  As 
we  have  pointed  out,  what  MM  real ly  attempts  to  do  is  to  calculate  a  measure 
of  how  well  each  of  the  Kalman  fi  Iters  is  tracking  by  looking  at  the  predic¬ 
tion  errors  Y^k+1)  ,  and  the  p^(k)  are  simply  measure  of  how  well  each  of  the 
models  are  tracking  relative  to  e.»ch  other  and  to  how  well  we  would  expect 
them  to  be  tracking.  The  critical  term  in  (2.13)  in  general  is 

Yi(k+l)v"1(k+l)Yi(k+l)  (2.14) 

which  is  the  square  of  the  tracking  error  normalized  by  the  predicted  co- 
variance  of  these  errors  assuming  is  true.  Titus  if  this  quantity  is 
large,  we  would  tend  to  disregard  the  ith  model,  while  if  this  is  small,  the 
ith  filter  is  tracking  well.  The  p^(k)  exhibit  exactly  this  type  of  be¬ 
havior,  and  thus  we  can  expect  m  to  be  reasonably  robust  to  non-Gaussian 
statistics.  Of  course  this  depends  upon  the  application,  but  we  have  had 
good  success  in  several  applications  [5,  10]  in  which  the  noises  were 
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decidedly  non-Gaussian. 

As  far  as  the  ronlinearity  of  th  •  real  system  is  concerned,  an  obvious 
approach  is  to  linearize  the  system  about  a  number  of  operating  points  for 
each  possible  model  and  use  these  linearized  models  to  design  extended  Kal- 
|nan  filters  which  would  be  used  in  place  ol  Kalman  filters  in  the  MM  algor¬ 
ithm.  Again  the  uti lity  of  this  approach  defends  very  much  on  the  particu¬ 
lar  application.  Essentially  the  issue  is  whether  the  tracking  error  from 
the  extended  Kalman  filter  corresponding  to  the  linearized  model  "closest 
to"  the  true,  nonlinear  system  is  markedly  smaller  than  the  errors  from 
filters  based  on  "mere  distant"  models.  This  is  basically  a  signal-to-noise 
ratio  problem,  similar  to  that  seen  in  the  idealized  MM  algorithm  in  which 
everything  is  linear.  In  that  case  the  noise  is  measured  by  the  V^(k+1). 

The  Larger  these  art ,  the  harder  it  will  be  to  distinguish  the  models  (the 
quantity  in  (2.14)  hecomes  smaller  as  IT  is  increased,  and  this  ift  turn 
tends  to  flatten  out  (as  a  function  of  i)  the  probabilities  in  (2.k3)).  In 
the  nonlinear  case,  the  inaccuracies  of  the  extended  Kalman  filters  effecti¬ 
vely  increase  the  (k+1)  thus  reducing  their  tracking  capabilities  and 
making  it  more  difficult  to  distinguish  among  them.  Therefore,  the  perfor¬ 
mance  of  MM  in  this  case  will  depend  upon  how  "far  apart”  the  different 
models  are,  as  compered  to  how  well  each  of  the  trackers  tracks,  flie  farther 
apart  the  models  are ,  the  more  signal  we  have;  the  poorer  the  tracking 
performance  is,  the  more  difficult  it  is  to  distinguish  among  the  hypotheses. 

Even  if  the  trte  system  is  linear,  there  is  clearly  the  question  of  the 
utility  of  MM  given  the  inevitability  of  discrepancies  between  the  actual 
system  and  any  of  tie  N  hypothesised  models.  Again  this  is  a  question  of 
signal-to-noise  ratio,  but  in  the  linear  case  a  number  of  results  And  ap¬ 
proaches  have  been  developed  for  dealing  with  this  problem.  For  exwtple, 
Braa  [16]  has  developed  a  precise  mathematical  procedure  for  calculating 
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the  distance  between  different  linear  models,  and  he  has  shown  that  the  MM 
procedure  will  converge  to  the  model  closest  to  the  real  model  (i.*:.  p^(k)+l 
for  the  model  nearest  the  true  system) .  This  can  be  viewed  as  a  technique 
for  testing  the  robustness  of  MM  or  as  a  tool  that  enables  us  to  decide  what 
models  to  choose.  That  is,  if  the  real  system  is  in  some  set  of  models  that 
may  be  infinite  or  may  in  fact  represent  a  continuum  of  models  (corresponding 
to  the  precise  values  of  certain  parameters) ,  then  Baram's  results  can  be 
used  to  decide  upon  a  finite  set  of  these  models  that  span  the  original  set 
and  that  are  far  enough  apart  so  that  MM  can  distinguish  among  them.  For 
example,  in  adaptive  flight  control  (reference  [17])  we  may  be  interested 
in  determining  the  flight  condition  (operating  point)  of  an  aircraft,  and 
we  can  think  of  using  MM  by  hypothesizing  a  set  of  linearized  models  that 
span  the  flight  envelope. 

Let  us  now  turn  explicitly  to  the  problem  of  detecting  abrupt  changes. 

In  such  problems  one  must  deal  with  one  key  issue  that  we  have  not  yet 
discussed.  Specifically,  in  change  detection  we  are  not  simply  attempting 
to  determine  which  of  the  models  qiven  in  (2.1)  -  (2.4)  is  the  correct  one, 
but  rather  we  are  trying  to  detect  a  shift  from  one  model  to  another.  That 
is,  in  this  case  the  actual  system  obeys  a  model  of  the  form 

x(k+l)  *  A(k)x(k)  +  B(k)u(k)  +  w(k)  +  g(k)  (2.15) 

y(k)  -  C(k)x(k)  +  vtk)  +  b(k)  (2.16) 

where  for  each  k  the  parameters  of  the  model  correspond  to  one  of  the  hy¬ 
pothesized  models  in  (2.1)  -  (2.4),  but  the  model  may  change  with  time. 

While  this  possibility  is  not  directly  taken  into  account  in  the  MM  method 
as  described  to  this  point,  this  algorithm  often  does  work  well  in  detecting 
shifts  without  any  major  modification  to  take  this  possiblity  into  account 
(see.  for  example  [5,  10].  the  important  issue  in  this  is  the  adaptability 


-il- 


of  MM  and  the  purpose;  of  the  particular  application. 

To  elaborate  on  his,  let  us  first  note  that  MM  will,  theoretically, 
eventually  indicate  a  shift  from  one  model  to  another.  TVo  things,  however, 
must  be  taken  into  ac -ount.  In  the  first  place,  we  see  from  (2.5)  that  if 
Pi(k)  is  small,  the  p  (k+1)  wil  grow  only  slowly  at  best.  In  fact,  in 
practice  we  have  foun  1  that  num<  rical  roundoff  often  leads  to  p^(k)  being 
set  to  zero  if  the  ith  model  is  not  vaLid  up  to  time  k.  In  this  case  p^(j) 
will  be  zero  for  all  i  >  k.  In  order  to  avoid  this  drastic  effect  and  also 
the  extremely  sluggis  i  response  of  MM  to  a  chanqe  in  models,  a  lower  bound 
is  usually  set  on  the  p^(k).  In  different  appl- cations  we  have  found  bounds 
from  10  3  down  to  10  5  to  be  satisfactory,  wit!'  very  little  sensitivity  to 

the  precise  value  of  he  bound.  As  a  second  point  we  note  that  if  a  parti¬ 
cular  model  is  not  co-rect  up  until  time  k  the  Kalman  filter  based  on  this 
model  may  develop  lar je  errors.  If  then  this  model  becomes  correct  at  time 
k,  it  may  take  a  long  time  before  the  prediction  errors  (2.8)  decrease  to 
reflect  the  validity  jf  the  model.  From  (2.13)  and  (2.5)  we  see  that  this 
in  turn  means  that  MM  may  not  r<  spond  to  this  change  for  some  time.  In  pra- 
tice  we  have  found  th*t  this  is  not  a  particularly  bad  problem  if  the  errors 
in  all  of  the  Kalman  filters  rei  ain  bounded  even  when  the  model  on  which  they 
are  based  is  incorrect.  If  a  p.  rticular  real  system-mismatched  Kalman  fil¬ 
ter  combination  is  unstable,  th<  n  there  may  be  problems  if  the  system  switch¬ 
es  to  the  model  corresponding  tt  this  filter.  What  we  have  found  is  a 
workable  solution  to  this  problem  is  to  reset  the  estimates  of  potentially 
divergent  Kalman  filters  to  the  estimate  of  the  most  probable  model,  and 

this  is  done  whenever  the  probability  of  possibly  diverging  filters  falls 
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below  a  threshold  (such  as  10  ) . 

With  these  modifications  MM  will  respond  more  quickly  to  model  changes. 
Whether  this  is  adequate  depends  upon  the  application.  In  particular,  if 


fast  response  is  needed  for  control  purposes  or  because  additional  model 
shifts  are  possible,  then  one  may  wish  to  consider  a  problem  formulation 
that  explicitly  includes  model  switches.  Furthermore,  in  some  applications 
the  time  at  which  a  shift  occurs  is  exceedingly  important,  and  in  such  a 
case  one  may  again  prefer  to  use  such  an  explicit  formulation,  as  one  must 
in  applications  such  as  multi-object  tracking  [37]  in  which  keeping  track 
of  large  numbers  of  possibilities  is  crucial. 

In  the  next  section  we  describe  one  such  formulation,  and  in  the 

remainder  of  this  section  we  indicate  how  the  MM  formulation  can  be  modified 

to  incorporate  model  changes  and  what  the  cost  is  for  this  modification. 

Specifically  suppose  that  the  real  system  does  correspond  to  one  of  the 

models  (2.1)  -  (2.4)  for  each  k  but  that  the  model  may  change  from  time  to 

time.  Clearly  there  are  several  different  constraints  that  we  can  place  on 

the  possible  sequences  of  models.  For  example,  if  there  are  no  constraints, 
k+1 

then  there  are  N  possible  sequences  of  models  over  the  first  k  time  steps 
(any  of  N  at  k=0,  any  of  N  at  t=l,.,.).  Such  a  situation  arises,  for  ex¬ 
ample,  if  one  assumes  that  the  sequence  of  models  is  governed  by  a  finite- 
state  Markov  processes.  Such  models  have  be*n  considered  by  several  authors. 
See  for  example  (40-42]  in  which,  in  additioi  to  considering  the  problem  of 
estimation,  these  authors  also  consider  the  problem  of  identifying  the 
transition  probability  matrix  for  the  finite-state  process. 

On  the  other  hand,  in  many  problems  one  is  interested  in  detecting 
individual  abrupt  changes  which  are  sufficiently  separated  in  time  so  that 
they  can  be  detected  and  accounted  for  separately.  In  such  a  case  it  is 
reasonable  to  allow  only  those  sequences  that  start  with  one  particular 
model  (the  "normal"  model)  and  have  a  single  shift  to  any  of  the  other 
models.  In  this  case  there  are  (kN-k+1)  possible  sequences  up  to  time  k  — 
essentially  we  must  account  for  all  possible  failure  times. 
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The  MM  solution  for  any  such  set  of  possible  sequences  of  models  is 
conceptually  identical  to  that  discussed  previously,  except  here  in  principle 
we  must  design  a  Kalman  filter  for  each  allowable  sequence  of  models.  The 
residuals  from  these  filters  are  then  used  exactly  as  described  earlier  to 
compute  the  probabilities  for  all  hypothesized  sequences.  Since  the  nunber 
of  possible  sequences  and  thus  filters  grows  in  time,  some  method  for  prun¬ 
ing  the  tree  of  hypotheses  is  needed.  For  example,  we  can  think  of  throw¬ 
ing  away  very  unlikely  models.  A  variety  of  techniques  for  handling  such 
MM  trees  have  been  considered  in  the  litera' ure  [18,  19,  37].  While  this 
may  at  first  glance  appear  to  be  a  hopelessly  complex  solution  to  the  change 
detection  problem,  this  approach  is  not  without  merit.  Specifically,  as  in 
[19]  this  approach  often  provides  a  great  deal  of  insight.  Also,  the  imple¬ 
mentation  of  Kalman  filter  trees  is  net  only  within  the  realm  of  feasibility 
for  implementation  using  high  speed  digital  hardware,  but  it  is  also  un¬ 
avoidable  in  problems  such  as  multi-object  tracking. 

3.  The  Generalized  Likelihood  Ratio  (GLR)  Method 

The  starting  point  for  the  GLR  method  is  a  model  describing  normal 
operation  of  the  observed  signals  or  of  the  system  which  generated  them. 
Abrupt  changes  are  then  modeled  as  additive  disturbances  to  this  model  that 
begin  at  unknown  times.  While  there  are  strong  similarities  between  the  GLR 
and  MM  formulations  —  indeed  in  many  cases  one  can  use  either  approach  with 
success  —  the  structure  of  the  GLR  algorithm  is  significantly  different 
than  that  for  the  MM  technique.  As  just  discussed  for  MM,  we  will  look  .»t 
the  case  of  a  single  such  change,  the  assumption  being  that  abrupt  changes 
are  sufficiently  separated  to  allow  for  individual  detection  and  compensa¬ 
tion.  The  solution  to  the  problem  just  described  and  applications  of  the 
method  can  be  found  in  [1,  3,  5,  10,  20,  21,  25).  In  this  section  we  outline 
the  basic  ideas  behind  the  technique  and  discuss  some  of  its  properties. 
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We  assume  that  the  system  under  consideration  can  be  modeled  as 

x(k+l)  =  A(k)xlk)  +  B(k)u(k)  +  w(k)  +  f  (k,0)v  (3.1) 

y (k)  =  C (k) x (k)  +  v (k)  +  q  (k,0)\  (3.2) 

where  the  normal  model  consists  of  these  equations  without  the  f^  and  g^ 
terms.  These  terms,  f^(k,9)v  and  g.(k,<')v,  represent  the  presence  of  the 
ith  type  of  abrupt  change,  i=l,...,N.  Here  0  is  the  unknown  time  at  which 
the  failure  occurs  (so  f^k.S)  =  q^(k,0)  =  0  for  k  <  6),  and  f^  and  g^  are 
the  specified  dynamic  profiles  of  the  ith  charqe  type.  For  example,  if 
f^=0  and  g^=a  vector  whose  components  are  all  zero  except  for  the  jth  one 
which  equals  1  for  k  >  0,  then  this  corresponds  to  the  onset  of  a  bias  in 
the  jth  component  of  y.  Finally,  the  scalar  denotes  the  magnitude  of  the 
failure  (e.g.  the  size  of  a  bias)  which  we  can  model  as  known  (as  in  MM 
and  as  in  what  is  called  simplified  GLR  (SGLR) )  or  unknown. 

Assume  that  we  design  a  Kalman  filter  basej  on  normal  operation,  i.e. 
by  neglecting  and  q^.  From  the  previous  section  we  have  that  this  filter 
is  given  by 

x(k+l|k)  =  Aik) x(k | k'  +  B(k)u(k)  (3.3) 

x(k+l | k+1)  =  x (k+1 | k !  +  K(k+l)Y(k+l)  (3.4) 

Y (k+1)  =  y ( k -  1 )  -  C(k) x (k+1 |k)  (3.5) 

where  K,  P,  and  V  are  calculated  as  in  (2.9)  -  (2.12).  Suppose  now  that  a 
type  i  change  of  size  v  occurs  at  time  0.  Then,  because  of  the  linearity 


of  (2.1)  -  (3.5)  we  can  write 

x(k)  *  x„(k)  +  a. <k,0)v  (3.6) 

x(k|k)  -  xN(k|k)  +  Pctk.O)  .'  (3.7) 

8 (k+1 | k)  =»  xN(k+l|k)  +  p.(k+l,0)v  (3.8) 

Y(k)  -  ymOO  +  P.  (k,0) v  (3.9) 

N  1 


where  x ,  x ,  and  Y  are  the  responses  if  no  abrupt  change  occurs,  and  the 
N  N  N 
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other  terms  are  the  responses  due  solely  to  the  abrupt  change.  Straight¬ 
forward  calculations  yield  recursive  equations  for  theses  quantities: 


ax  (k+1,0)  =  AtkJa^k.G)  +  f  (k,0) ,  a  (0,0)  =  0  (3.10) 

8.  (k+1,0)  =  H-K(k+l)C(k+l)lM  (k+1,0)  +  K(k+1). 

i 

•[C(k+l)r».  (k+1,0)  +  q. (k+1,0)]  .  (3.11) 

p.(k+l,6)  =  A(k)8.(k,0),  6  (0-1,0)  =  0  (3.12) 

Pi(k,0)  =  C(k)[a.(k,0)  -  p  (k,0) ]  +  g  (k,0)  (3.13) 


The  important  point  about  these  quantities  is  that  they  can  be  pre¬ 
computed.  Furthermore,  by  its  definition,  VN(k)  is  the  innovations  under 
normal  conditions,  i.e.  it  is  zero-mean,  white,  Gaussian  with  covariance 
V(k).  Thus  we  now  have  a  standard  detection  problem  in  white  noise:  we 
observe  the  filter  residuals  y(k),  which  can  be  modeled  as  in  (3.9),  and  we 
want  to  detect  the  presence  of  a  change  (i.e.  that  k  >  0)  and  perhaps  de¬ 
termine  its  identity  i  and  estimate  its  time  of  occurrence  0  and  sise  v, 
if  the  latter  is  modeled  as  being  unknown.  The  solution  to  this  problem 
involves  matched  filtering  operations.  First,  define  the  precomputable 
quantities 

*  -1 

a(k,0,i)  =  Zp.'(j,e)V  A(j)p  (j,0)  (3.14) 

j-e 

This  has  the  interpretation  as  the  amount  of  information  present  in 
y(0),..»y(k)  about  a  type  i  change  occurring  at  time  6. 

Hie  on-line  GLR  calculations  consist  of  the  calculation  of 
k  -1 

d(k,6,i)  -  ZP!(j,0)v  <j)y(j)  (3.15) 

j-e 

which  are  essentially  correlations  of  the  observed  residuals  with  the 
abrupt  change  signatures  p^(j,8)  for  different  hypothesised  types,  i,  and 
times,  0.  If  v  is  known  (the  SGLR  case),  then  the  likelihood  of  a  type  i 
change  having  occurred  at  time  0  given  data  y(l) ,. . . »y(k)  is 


✓ 
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«•  (k,9,i)  =  2vd(k,0,i)  -  v  a(k,l',i) 

s 

If  v  is  unknown,  then  the  generalized  likelihood  for  this  change  is 


(3.16) 


Mk,P,i)  = 


d*(k,9,j) 

a(k,0,i) 


(3.17) 


and  the  maximum  likelihood  estimate  of  1  assuming  a  change  of  type  i  at 
time  9  is 


v(k,P,i)  = 


d(k,0 , i) 
a(k,t‘,i) 


(3.18) 


Thus  the  GLR  algorithm  consists  of  the  single  Kalman  filter  (3.3)  - 
(3.5),  the  matched  filter  operations  of  (3.15),  and  the  likelihood  calcu¬ 
lation  of  (3.16)  or  (3.17).  The  outputs  of  the  method  are  these  likeli¬ 
hoods  and  the  estimates  of  eq.  (<.18)  if  \>  is  modeled  as  unknown.  The 
basic  idea  behind  GLR  is  that  different  types  of  abrupt  changes  nroduce 
different  kinds  of  effects  on  tht  filter  innovations  —  i.e.  different 
signatures  —  and  GLR  calculates  the  likelihood  of  each  possible  event  by 
correlating  the  innovations  with  the  corresponding  signature. 

As  with  the  MM  method  a  number  of  issues  can  be  raised  about  GLR. 

Some  of  these,  such  as  the  effect  of  nonlinearities  and  robustness  to  model 
errors,  are  very  similar  to  the  f’M  case.  Essentially  it  still  can  be  viewed 
as  a  signal-to-noise  ratio  problem:  in  the  nonlinear  case  the  additive  de¬ 
composition  of  (3.9)  is  not  precisely  valid,  but  it  may  be  approximately 
correct.  Also,  different  failure  modes  can  be  distinguished  even  in  the  pre¬ 
sence  of  modelling  errors  if  their  signatures  are  different  enou<£i.  Again 
these  issues  depend  very  much  on  the  particular  application.  We  refer  the 
reader  to  (4,  6,  10,  11,  21,  25)  for  discussions  of  sev< ral  applications  of 


GLR  to  applications  in  which  these  issues  had  to  be  add:essed. 

GLR  has  been  successfully  applied  to  a  wide  variety  of  applications, 
such  as  failure  detection  (1,  4],  geophysical  signal  analysis  (7),  detecting 
arrhythmias  in  electrocardiograms  (6),  freeway  incident  detection  (10,  11), 
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and  maneuver  detection  [20,  21,  25].  Note  that  the  model  used  in  (3.1), 

(3.2)  for  such  changes  is  an  additive  model.  Thus  it  appears  on  the  surface 
that  the  types  of  abrupt  changes  that  can  bt  detected  by  GLR  are  a  special 
subset  of  those  that  can  be  detected  by  since  (2.1),  (2.2)  allow  para¬ 
metric  changes  (in  A,  B,  C,  Q,  R)  as  well  a;;  additive  ones.  There  are 
several  points,  however,  that  must  be  taken  into  account  in  assessing  and 
comparing  MM  and  GLR: 

(1)  The  price  one  pays  for  allowing  parametric  changes  in  MM  is  the 
necessity  of  implementing  banks, of  Kalman  filters,  and  actually 
trees  of  such  filters  to  account  ‘or  switches  between  models.  GLR, 
on  the  other  hand,  requires  a  si ngle  Kalman  filter  and  a  growing 
number  of  correlation  calculation:,  as  in  (3.15),  which  in  principle 
must  be  calculated  for  i=l,...,N  and  0=1,..., k.  He  will  comment 
shortly  on  the  computational  issues  concerned  with  these  correla¬ 
tions,  but  for  now  we  simply  point  out  that  they  are  typically  far 
less  involved  than  the  calculations  inherent  in  Kalman  filters 
(see  [4,  6,  7]  for  examples  of  how  simple  these  calculations  can 
be) .  Also,  because  it  operates  on  the  outputs  of  a  normal  mode 
filter,  GLR  can  be  easily  implemented  and  attached  as  a  monitor  to 
an  already  existing  system. 

(2)  Extensions  to  the  GLR  method  can  lie  developed  for  the  detection  of 
parametric  changes  [38],  This  extended  GLR  bears  some  similarity 
to  extended  Kalman  filtering  and  iterated  extended  Kalman  filtering. 

(3)  It  has  been  our  experience  that  a  GLR  system  based  on  the  detection 
of  additive  effects  can  often  also  detect  parameter  failures.  FOr 
example,  a  gain  change  in  a  sensor  does  look  like  a  sensor  bias, 
albeit  one  that  is  modulated  by  the  value  of  the  variable  being 
sensed.  That  is,  any  detectable  change  will  exhibit  a  systematic 
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deviation  between  what  is  obsi-rvcd  .md  what  is  predicted  to  In- 
observed.  Obviously,  the  ability  of  GLR  to  detect  a  parametric 
change  when  it  is  looking  for  additive  ones  is  again  a  question 
of  robustness.  If  the  effect  of  the  parametric  change  is  "close 
enough”  to  that  of  the  additive  one,  the  system  will  work.  This 
has  been  the  case  in  all  of  our  expirience.  In  particular  we 
refer  the  reader  to  [41  for  an  additive-failure-based  design  that 
has  done  extremely  well  in  detecting  gain  changes  in  sensors.  Note 
of  course  that  in  this  mode  GLR  is  essentially  only  indicating 

A 

an  alarm  —  i.e.  the  estimate  v  of  the  "bias"  is  meaningless,  Dut 
in  many  detection  problems  our  primary  interest  is  in  simply 
identifying  which  of  several  types  of  changes  has  occurred. 

There  are  several  final  issues  that  should  be  mentioned  in  discussing 
GLR.  The  first  concerns  the  calculation  of  statistical  measures  of  per¬ 
formance  of  GLR.  As  mentioned  in  the  preceding  section,  Baram  [16]  has 
developed  a  method  for  measuring  the  distance  >>e tween  models  and  hence  a 
measure  of  the  detectability  and  distinguisha) ility  of  different  failure 
modes.  Similar  calculations  can  be  performed  for  GLR,  but  in  this  case  it  is 
actually  simpler  to  do  and  interpret,  as  we  can  use  standard  detection- 
theoretic  ideas.  Specifically,  a  direct  measure  of  the  detectability  of  a 
particular  type  of  change  is  the  information  .i(k,0,i)  defined  in  (3.14). 

This  quantity  can  be  viewed  as  the  correlation  of  p^(j,0)  with  itself  at 
zero  lag.  Similarly,  we  can  determine  the  relative  distinguishability  of  a 
type  i  change  at  two  times  0^  and  02  as  the  correlation  of  the  corresponding 
signatures 

k  -1 

a(k,6  ,6  ,i)  ■  I  p! ( j ,0. )V  ( j) p. ( j ,0J  (3.19) 

1  2  j«max(0  ,9  )  11  12 
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and  the  relative  distinguishability  of  type  i  and  m  changes  at  tines  6^  and 
©2  similarly: 

k 

a(k,0  ,0  ,i,m)  =  £  »  '  ( j  )  V-1  (j) p  (j,0,)  (3.20) 

j=max(0l(02)  11  m  2 

These  quantities  provide  us  with  extremely  useful  information.  For  example, 
in  some  applications  16-9)  the  estimation  of  the  time  0  at  which  the  change 
occurs  is  critical,  and  (3.19)  provides  information  about  how  well  one  can 
resolve  the  onset  time.  In  failure  detection  applications  these  quantities 
directly  provide  us  with  information  about  how  system  redundancy  is  used  to 
detect  and  distinguish  failures  and  can  be  used  in  deciding  whether  addition¬ 
al  redundancy  (e.g.  more  sensors)  are  needed.  Also,  the  quantities  in  (3.14), 
(3.19),  and  (3.20)  directly  give  the  statistics  of  the  likelihood  measures 

(3.16),  (3.17).  For  the  SGLR  case  of  (3.16),  £  is  Gaussian,  and  its  mean 

s 

2 

under  no  failure  is  -v  a(k,0,i),  while  if  a  type  m  failure  occurs  at  time  $, 
its  mean  is 


EU2(k,0,i)  |  (m,<D)] 


v2  [2a(k,0,<>,i,m)  -  a(k,6,i)J 


(3.21) 


For  example  if  (m,i)  =  (i,0)  —  i.e.  if  the  precise  failure  and  time  assumed 

2 

in  the  calculation  of  £.  (k,9,i)  are  true,  then  its  mean  is  +v  a(k,0,i).  In 

s 

the  case  of  (3.17),  under  no  failure  £(k,6,i)  is  a  chi-squared  random  vari¬ 
able  with  1  degree  of  freedom,  while  if  a  failure  (m,$)  of  size  ■>  occurs 
£(k,6,i)  in  non-central  chi-squared  with  mean 


Eii<k,e,i)  |  <■.,♦)]  =  i  +  v 


(3.22) 


Clearly  these  quantities  can  be  very  useful  in  evaluating  the  performance  of 
GLR  detection  algorithms  and  for  determining  decision  rules  based  on  the 
GLR  outputs.  If  one  were  to  follow  the  precise  GLR  philosophy  (39),  the 
decision  rule  one  would  use  is  to  choose  at  each  time  k  the  largest  of  the 
tg(k,P,i)  or  J,(k,0,i)  over  all  possible  change  types  i  and  onset  times  0. 


I 


-20- 


This  largest  value  would  then  be  compared  to  a  threshold  for  change  detec¬ 
tion,  and  if  the  threshold  is  exceeded  the  corresponding  maximizing  values 
of  0  and  i  are  taken  as  the  estimates  of  chanqe  type  and  time.  While  such 

a  simple  rule  works  in  some  cases  [6,  21),  it  is  worthwhile  often  to  consider 

♦ 

more  complex  rules  based  on  the  I's.  For  example,  persistence  tests  (i.e. 

8.  must  exceed  the  threshold  over  some  i ime  per.od)  are  often  used  to  cut 
down  on  false  alarms  due  to  spurious  and  unmodoled  events.  See  [4,  7,  9,  261 
for  more  discussion  of  decision  rules. 

A  final  issue  to  be  mentioned  is  the  pruning  of  the  tree  of  possibi¬ 
lities.  As  in  the  MM  case  in  principle  we  hav<‘  a  growinq  number  of  calcu¬ 
lations  to  perform,  as  d(k,0,i)  must  be  calculated  for  i=l,...,N  and  all 
possible  change  tines  up  to  the  present,  i.e.  H»l,...,k.  What  is  usually 
done  is  to  look  only  over  a  sliding  window  of  possible  times: 

k-Mx  <  0  <  k-M2  (3.23) 

where  and  M2  are  chosen  based  on  the  a's  --  i.e.  on  detectability  and 
distinguishability  considerations.  Basically  after  M2times  steps  from  the 
onset  of  change  we  have  collected  enough  information  so  that  we  may  make 
a  detection  with  a  reasonable  amount  of  accuracy.  Further,  after  time 
steps  we  will  have  collected  a  sufficient  amount  of  information  so  that 
detection  performance  is  as  good  as  it  can  be  (i.e.  there  is  no  point  in 
waiting  any  longer).  Clearly  we  want  M^,  large  to  allow  for  maximum 
information  collection,  but  we  want  them  small  for  fast  response  and  for 
computational  simplicity.  This  is  a  typical  tradeoff  that  arises  in  all 
change  detection  problems. 
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